Towards a General Technique for Transformation of Nominal Features into Numeric Features in Supervised Learning
نویسندگان
چکیده
Almost all of the machine learning problems require data preprocessing. This stage is especially important for problems where the datasets contain features of mixed types (i.e. nominal and numeric). An often practice in such cases is to transform each nominal features into many dummy (i.e. binary) features. Also many classification algorithms have preference of numeric attributes over nominal attributes, and sometimes the distance between different data points cannot be estimated if the values of the attributes are not numeric and normalized. One way to transform nominal into numeric features is to use the Weight of Evidence (WoE) technique. WoE has some properties that make it very useful tool for transformation of attributes, but unfortunately there are some preconditions that need to be met in order to calculate it. Additionally WoE originally works only on supervised learning problems where data is labelled with two classes. In this paper we propose modified calculation of the Weight of Evidence that overcomes these preconditions, and additionally makes it usable for test examples that were not present in the training set. The proposed transformation can be used for all supervised learning problems and arbitrary number of classes. This paper establishes the theoretical background for such modifications, and does not present any comparative results with other similar techniques.
منابع مشابه
Emotion Detection in Persian Text; A Machine Learning Model
This study aimed to develop a computational model for recognition of emotion in Persian text as a supervised machine learning problem. We considered Pluthchik emotion model as supervised learning criteria and Support Vector Machine (SVM) as baseline classifier. We also used NRC lexicon and contextual features as training data and components of the model. One hundred selected texts including pol...
متن کاملMEFUASN: A Helpful Method to Extract Features using Analyzing Social Network for Fraud Detection
Fraud detection is one of the ways to cope with damages associated with fraudulent activities that have become common due to the rapid development of the Internet and electronic business. There is a need to propose methods to detect fraud accurately and fast. To achieve to accuracy, fraud detection methods need to consider both kind of features, features based on user level and features based o...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملAn Evolution Strategies Approach to the Simultaneous Discretization of Numeric Attributes
Many data mining and machine learning algorithms require databases in which objects are described by discrete attributes. However, it is very common that the attributes are in the ratio or interval scales. In order to apply these algorithms, the original attributes must be transformed into the nominal or ordinal scale via discretization. An appropriate transformation is crucial because of the l...
متن کاملAutomated Detection of Multiple Sclerosis Lesions Using Texture-based Features and a Hybrid Classifier
Background: Multiple Sclerosis (MS) is the most frequent non-traumatic neurological disease capable of causing disability in young adults. Detection of MS lesions with magnetic resonance imaging (MRI) is the most common technique. However, manual interpretation of vast amounts of data is often tedious and error-prone. Furthermore, changes in lesions are often subtle and extremely unrepresentati...
متن کامل